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An  algorithm  is  presented  for  the  rapid  evaluation  of  the  potential  and  force  fields  in  systems 
involving  large  numbers  of  particles  whose  interactions  are  Coulombic  or  gravitational  in  nature. 
For  a  system  of  N  particles,  an  amount  of  work  of  the  order  0(iVa)  has  traditionally  been  required 
to  evaluate  all  pairwise  interactions,  unless  some  approximation  or  truncation  method  is  used. 
The  algorithm  of  the  present  paper  requires  an  amount  of  work  proportional  to  N  to  evaluate  all 
interactions  to  within  roundoff  error,  making  it  considerably  more  practical  for  large-scale  problems 
encountered  in  plasma  physics,  fluid  dynamics,  molecular  dynamics  and  celestial  mechanics. 
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1.  Introduction 

The  study  of  physical  systems  by  means  of  particle  simulations  is  well-established  in  a  number 
of  fields  and  becoming  increasingly  important  in  others.  The  most  classical  example  is  probably 
celestial  mechanics,  but  much  recent  work  has  been  done  in  formulating  and  studying  particle 
models  in  plasma  physics,  fluid  dynamics  and  molecular  dynamics  [4]. 

There  are  two  major  classes  of  simulation  methods.  Dynamical  simulations  follow  the  trajec¬ 
tories  of  N  particles  over  some  time  interval  of  interest.  Given  initial  positions  {z,}  and  velocities, 
the  trajectory  of  each  particle  is  governed  by  Newton’s  second  law  of  motion: 

A [2  jr  • 

m,-  -jp-  =  -  V($>  for  i  =  1, ...,  N  , 

where  m,-  is  the  mass  of  ilh  particle,  and  the  force  is  obtained  from  the  gradient  of  a  potential 
function  When  one  is  interested  in  an  equilibrium  configuration  of  a  set  of  particles  rather  than 
their  time-dependent  properties,  an  alternative  approach  is  the  Monte  Carlo  method.  In  this  case, 
the  potential  function  $  has  to  be  evaluated  for  a  large  number  of  configurations  in  an  attempt  to 
determine  the  potential  minimum. 

We  restrict  our  attention  in  this  paper  to  the  case  where  the  potential  (or  force)  at  a  point  is 
a  sum  of  pairwise  interactions.  More  specifically,  we  consider  potentials  of  the  form 


$  —  Qfar  d"  (^near  "b  ^ei/erna/)> 

where  <$„ear  (when  present)  is  a  rapidly  decaying  potential  (e.g.  Van  der  Waals),  $exttrnai  (when 
present)  is  independent  of  the  number  of  particles,  and  $/ar,  the  far-field  potential,  is  Coulombic 
or  gravitational.  Such  models  describe  classical  celestial  mechanics  and  many  problems  in  plasma 
physics  and  molecular  dynamics.  In  the  vortex  method  for  incompressible  fluid  flow  calculations 
[3],  an  important  and  expensive  portion  of  the  computation  has  the  same  formal  structure  (the 
stream  function  and  the  vorticity  are  related  by  Poisson’s  equation). 

In  a  system  of  N  particles,  the  calculation  of  $neor  requires  an  amount  of  work  proportional 
to  N,  as  does  the  calculation  of  $eztcmai-  The  decay  of  the  Coulombic  or  gravitational  potential, 
however,  is  sufficiently  slow  that  all  interactions  must  be  accounted  for,  resulting  in  CPU  time 
requirements  of  the  order  0(iV2).  In  this  paper  a  method  is  presented  for  the  rapid  (order  O(N)) 
evaluation  of  these  interactions  for  all  particles. 

There  have  been  a  number  of  previous  efforts  aimed  at  reducing  the  computational  complexity 
of  the  iV-body  problem.  Particle-in-cell  methods  [4]  have  received  careful  study  and  are  used  with 
much  success,  most  notably  in  plasma  physics.  Assuming  the  potential  satisfies  Poisson’s  equation, 
a  regular  mesh  is  laved  out  over  the  computational  domain  and  the  method  proceeds  by: 


(1)  interpolating  the  source  density  at  mesh  points, 

(2)  using  a  ’‘fast  Poisson  solver”  to  obtain  potential  values  on  the  mesh, 

(3)  computing  the  force  from  the  potential  and  interpolating  to  the  particle  positions. 


The  complexity  of  these  methods  is  of  the  order  0(N  +  M  logM ),  where  M  is  the  number  of 
grid  points.  The  number  of  grid  points  is  usually  chosen  to  be  of  the  same  order  as  the  number 
of  particles,  and  the  resulting  complexity  estimate  for  the  method  is  0(.V  logN).  Unfortunately, 
the  grid  provides  limited  resolution,  and  highly  non-uniform  source  distributions  cause  a  significant 
degradation  of  performance.  Further  errors  are  introduced  in  step  (3)  by  the  necessity  for  numerical 
differentiation  to  obtain  the  force. 

To  improve  the  accuracy  of  particle-in-cell  calculations,  short-range  interactions  can  be  handled 
by  direct  computation,  while  far-field  interactions  are  obtained  from  the  mesh,  giving  rise  to  so- 
called  particle-particle/particle-mesh  ( P3\I )  methods  [4].  An  adaptation  of  these  techniques  to 
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vortex  method  calculations  has  recently  been  carried  out  by  Anderson  [l] .  These  algorithms  improve 
the  local  accuracy  of  the  calculations,  but  their  computational  complexity  remains  of  the  order 
0{N  logN),  and  they  still  depend  for  their  efficient  performance  on  a  resonably  uniform  distribution 
of  particles. 

Appel  [2]  introduced  a  “grid-less”  method  for  many-body  simulation  with  a  computational 
complexity  estimated  to  be  of  the  order  0(N  logN).  It  relies  on  using  a  monopole  (center-of-mass) 
approximation  for  computing  forces  over  large  distances  and  sophisticated  data  structures  to  keep 
track  of  which  particles  are  sufficiently  clustered  to  make  the  approximation  valid.  For  certain  types 
of  problems,  the  method  achieves  a  dramatic  speed-up  compared  to  the  naive  0{N 2)  approach.  It 
is  less  efficient  when  the  distribution  of  particles  is  relatively  uniform  and  the  required  precision  is 
high. 

The  algorithm  we  present  uses  multipole  expansions  to  compute  potentials  or  forces  to  whatever 
precision  is  required,  and  the  CPU  time  expended  is  proportional  to  jV.  The  approach  we  use  is 
similar  to  the  one  introduced  in  [6]  for  the  solution  of  boundary  value  problems  for  the  Laplace 
equation.  In  the  following  section,  we  describe  the  necessary  analytical  tools,  while  section  (3)  is 
devoted  to  a  detailed  description  of  the  method. 

2.  Physical  and  Mathematical  Preliminaries 

In  this  paper,  we  consider  a  two-dimensional  physical  model  which  consists  of  a  set  of  N  charged 
particles  with  the  potential  and  force  obtained  as  the  sum  of  pairwise  interactions  from  Coulomb’s 
law.  Suppose  that  a  point  charge  of  unit  strength  is  located  at  the  point  (zo,yo)  =  xo  6  R 2.  Then, 
for  any  x  =  (z,  y)  €  R2  with  x  ^  Xo,  the  potential  and  electrostatic  field  due  to  this  charge  are 
described  by  the  expressions 


<i>x  o(x,y)  =  -fojOJx-Xo 

E  (x  v )  -  (X~Xq) 

II  X  -  xo  il2 


and 


respectively. 

It  is  well-known  that  4>Xo  is  harmonic  in  any  region  not  containing  the  point  xo.  Moreover, 
for  every  harmonic  function  u,  there  exists  an  analytic  function  w  :  Cl  Cl  such  that  u(z,  y)  = 
Re(w(x,y)),  and  w  is  unique  except  for  an  additive  constant.  In  the  remainder  of  the  paper  we 
will  work  with  analytic  functions,  making  no  distinction  between  a  point  (z,  y)  e  R2  and  a  point 
x  +  iy  =  z  €  Cl .  We  note  that 


<t>x o(x)  =  Re(log(z  -  z0)), 

and,  following  standard  practice,  we  will  refer  to  the  analytic  function  log{z)  as  the  potential 
due  to  a  charge.  As  we  develop  expressions  for  the  potential  due  to  more  complicated  charge 
distributions,  we  will  continue  to  use  complex  notation,  and  will  refer  to  the  corresponding  analytic 
functions  themselves  as  the  potentials.  The  following  lemma  is  an  immediate  consequence  of  the 
Cauchy-Riemann  equations. 

Lemma  2.1.  If  u(x,y)  =  Re( w(x,y))  describes  the  potential  held  at  (x,y),  then  the  corresponding 
force  held  is  given  by 

Vn  =  ( uz ,  uy)  =  (Re(w'),  -Im(  :r')), 
where  w'  is  the  derivative  of  w. 

The  following  lemma  is  used  in  obtaining  the  multipole  expansion  for  the  field  due  to  m  charges. 
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Lemma  2.2.  Let  a  point  charge  of  intensity  q  be  located  at  zq.  Then  for  any  z  such  that  |z|  >  |zo|, 


<t>z0{z)  =  9  log{z  -  z0)  =  q  ^log(z)  ”  £  ^  (7)  j  •  (2-1) 

Proof.  Note  first  that  log(z  -  zo)  -  log{z )  =  log  (l  -  &■)  and  that  |^|  <  1.  The  lemma  now  follows 
from  the  expansion 

log{l-  w)  =  (-l)X^T'  ’ 

fc=i  K 

which  is  valid  for  any  w  such  that  |w|  <  1. 


Theorem  2.1.  (Multipole  Expansion).  Suppose  that  m  charges  of  strengths  {g,-,  i  =  are 

located  at  points  (z,-,  i  =  with  |z,|  <  r.  Then  for  any  z  €  C1  with  |z|  >  r,  the  potential 

<f>(z)  is  given  by 

OO 

4>(z)  =  Q  log[z)  +  Y,  JZ’  (2-2) 


where 


m  k 

<?  =  !>  and  a*  =  £ 

i=l  i=l 

Furthermore,  for  any  p  >  1, 

O(z)  -  Q  Ic^z) <«j:|'+‘<(-4T)  ()) 


where 


c=|;|  ’  A  =  X^I<7.i  >  and  Q:=7rrz] 


Proof.  The  form  of  the  multipole  expansion  (2.2)  is  an  immediate  consequence  of  the  preceding 
lemma  and  the  fact  that  <j>{z)  =  ]C£Li  ^zA2)-  To  obtain  the  error  bound  (2.4),  observe  that 


p 

OO 

4>{z)  -  Q  log(z)  -  Yl  7  = 

7  a* 
Z-  ,k 

*=1 

k=p+l 

Substituting  for  a*  the  expression  in  (2.3),  we  have 

OO  OO  Ic  OO  ,  „  L. 

E  3  S-'E  E  (;)  —If 

*=p  + 1  Jfc=p+1  k=p+l 


r\k  r  ]P+1  (  .4  \  f  l  \  p 

l)  =a  Z\  ~  {—lj  \c) 


In  particular,  if  c  >  2.  then 


-X3  <■*(;)'■ 


3 


(2.6) 


Finally,  we  demonstrate  with  a  simple  example  how  multipole  expansions  can  be  used  to  speed 
up  calculations  with  potential  fields.  Suppose  that  charges  of  strengths  ?i,?2 ,  — ,?m  are  located  at 
the  points  xi,xjt...,xm  6  Cl  and  that  {yi,j/2i  ...,yn}  is  another  set  of  points  in  Cl  (Figure  1).  We 
say  that  the  sets  {x,}  and  {y,}  are  well-separated,  if  there  exist  points  xq,  yo  6  C1  and  a  real  r  >  0 
such  that 

( Xi  —  io|  <  r  for  all »  =  , 

|y j  -  yo|  <  r  for  all  j  =  1, n  ,  and 

|*o  -  yol  >  3 r. 

In  order  to  obtain  the  potential  (or  force)  at  the  points  {yy}  due  to  the  charges  at  the  points 
{x,}  directly,  we  could  compute 

m 

^^ii(yy)  for  all  j  as  1, ...,  n.  (2.7) 

i=i 

This  clearly  requires  order  nm  work  (evaluating  m  fields  at  n  points).  Now  suppose  that  we 
first  compute  the  coefficients  of  a  p-term  multipole  expansion  of  the  potential  due  to  the  charges 
<7ii?2i  ••■iQm  about  xo,  using  Theorem  2.1.  This  requires  a  number  of  operations  proportional  to 
mp.  Evaluating  the  resulting  multipole  expansion  at  all  points  yy  requires  order  np  work,  and  the 
total  amount  of  computation  is  of  the  order  0(mp+  np).  Moreover,  by  (2.6), 

E  Mn)  -  Q  '<*(»  -  *o)  -  E  |W  !‘t0|»  «  A  (1)  ■ 

and  in  order  to  obtain  a  relative  precision  «  (with  respect  to  the  total  charge),  p  must  be  of  the 
order  -log^ie).  Once  the  precision  is  specified,  the  amount  of  computation  has  been  reduced  to 

0(m)  +  0{n)  , 

which  is  significantly  smaller  than  nm  for  large  n  and  m. 

2.1.  Translation  Operators  and  Error  Bounds 

The  following  three  lemmas  constitute  the  principal  analytical  tool  of  this  paper,  allowing  us 
to  manipulate  multipole  expansions  in  the  manner  required  by  the  fast  algorithm.  Lemma  2.3 
provides  a  formula  for  shifting  the  center  of  a  multipole  expansion,  lemma  2.4  describes  how  to 
convert  such  an  expansion  into  a  local  (Taylor)  expansion  in  a  circular  region  of  analyticity,  and 
lemma  2.5  furnishes  a  mechanism  for  shifting  the  center  of  a  Taylor  expansion  within  a  region  of 
analyticity.  We  also  derive  error  bounds  associated  with  these  translation  operators  which  allow 
us  to  carry  out  numerical  computations  to  any  specified  accuracy. 

Lemma  2.3.  Suppose  that 

=  00  log{z  -  *o)  +  77  \k  (2-8) 

fc-i  “o) 

is  a  multipole  expansion  of  the  potential  due  to  a  set  of  m  charges  of  strengths  qj,  qj, . . . ,  qm,  all  of 
which  are  located  inside  the  circle  D  of  radius  R  with  center  at  zq.  Then  for  z  outside  the  circle 
D\  of  radius  ( R  +  |zo|)  and  center  at  the  origin, 

<*>{z)  =aolog{z)  +  Yijr 

;=i  2 
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where 


b,  =  ^2akz‘0~k 
*= 1 


with  (!)  ciie  binomial  coefficients.  Furthermore,  for  any  p  >  1, 


(2.10) 


<t>(z)  -  a0  log(z)  - 


< 


Ifoj  +  R 
z 


P+1 


(2.11) 


with  A  defined  in  (2.5). 

Proof.  The  coefficients  of  the  shifted  expansion  (2.9)  are  obtained  by  expanding  into  a  Taylor  series 
the  expression  (2.8)  with  respect  to  zo-  For  the  error  bound  (2.11),  observe  that  the  terms  {£>(}  are 
the  coefficients  of  the  (unique)  multipole  expansion  about  the  origin  of  those  charges  contained  in 
the  circle  D,  and  Theorem  2.1  applies  immediately  with  r  replaced  by  |zo|  +  R. 


Remark:  Once  the  values  {ao,ai,  ...,ap}  in  the  expansion  (2.8)  about  z<>  are  computed,  we  can 
obtain  {f>i, ...,  bp}  exactly  by  (2.10).  In  other  words,  we  may  shift  the  center  of  a  truncated  multipole 
expansion  without  any  loss  of  precision. 

Lemma  2.4.  Suppose  that  m  charges  of  strengths  <71,172,  •••,9m  are  located  inside  the  circle  D\  with 
radius  R  and  center  at  zo,  and  that  |zq|  >  (c+  l)i?  with  c  >  1.  (Figure  2.)  Then  the  corresponding 
multipole  expansion  (2.8)  converges  inside  the  circle  D?  of  radius  R  centered  about  the  origin. 
Inside  Di,  the  potential  due  to  the  charges  is  described  by  a  power  series: 


OO 


*(*)  -J2bi  z'' 
1-0 


(2.12) 


where 


and 


OO 


bo  =  X]  +  a°  l°9{-zo), 

km  1  ’O 


for  l  >  1. 


(2.13) 


(2.14) 


Furthermore,  for  any  p  >  max 


an  error  bound  for  the  truncated  series  is  given  by 


®(*)  -J2b,  z‘ 

1=0 

where  A  is  defined  in  (2.5)  and  e  is  the  base  of  natural  logarithms. 

Proof.  We  obtain  the  coefficients  of  the  local  expansion  (2.12)  from  MacLaurin's  theorem  applied 
to  the  multipole  expansion  (2.3).  To  derive  the  error  bound  (2.15).  we  let  *ro  =  00  log(-zo). 
=  -( -24.)  for  l  >  1.  and  3i  —  bi  -  ~u  for  /  >  0.  Then 

*  ~n 


A(4e[p  +  c)(c  +  1)  +c2)  /  iy+l 
c(c  -  1)  \cj 


p  !  JO 

o(z)  -  ^^6;  z(j  =  '  6/  z1'  <  S 1  -*-52  (2. 1C) 

1=0  :  '<=?*>  i 


5 


with 


A  bound  for  5!  is  easily  found  by  observing  that 


00  J 


°°  J 


=  E11 '  -  Y  ^ A  Y  fLj 

lmp+l  lmp+l  *0  J*p+ 1  *0 

To  obtain  a  bound  for  5 2.  let  C  be  a  circle  of  radius  s  where  s  =  cR  (Figure  2).  Note  first 

that  for  any  p  >  ^ , 

cR  +  R 

R  <  — - —  <  s  <  cR. 

2 

Defining  the  function  <t>\  :  Cl  \  D\  —  C1  by  the  expression 

<M~)  =  6{z)  -  a0  ■  log(z  -  z0 ), 

and  using  Taylor’s  theorem  for  complex  analytic  functions  (see  [5],  p.  190),  we  obtain 

s,  =  M.-)-X>*'  1=  f>-' 

1=0  l=p+l  1  ~  ,  V  ' 


where 


Obviously,  for  any  t  lying  on  C. 


and  it  is  easv  to  see  that 


M  =  max  \<p\  (f)|. 


i»i«)i<E 

.  .  *  ~n 


|<it|  <  ARk  and  |<  -  ;o|  >  R  +  cR  -  s  =  R  H - . 


.\fter  some  algebraic  manipulation,  we  have 


.1/  <  A 


pR  +•  cR 


a,"‘  1  -  T  s  TrTr- 


Observing  that  for  any  positive  integer  n  and  any  integer  p  >  2. 


+  -  < 


we  obtain 


A(pR  +  cR)(cR  +  R)  ( |£iy+I  /_p_\p+1 
2  “  cR{cR-R)  \cRj  \p-l) 


K  A(p  +  c)(c  +  i)  fiy+1  f 
-  c(c-  1)  \c)  \ 

p+i 


1  + 


P  ~  1 


p-i 


1  + 


P  ~  1 


^  4 Ae(p  +  c)  (c  +  1)  / 1 V 

c(c-l)  U/ 

Adding  the  last  expression  to  the  error  bound  for  5 1  completes  the  proof. 


I 


The  following  lemma  is  an  immediate  consequence  of  MacLaurin’s  theorem.  It  describes  an 
exact  translation  operation  with  a  finite  number  of  terms,  and  no  error  bound  is  needed. 

Lemma  2.5.  For  any  complex  zq,  z  and  {a*},  k  =  1, 2, . . . ,  n, 


n 


Jfe=0 


■«>*-££ 


at 


!=> 0  \k=i 


(-*>) 


k-l 


(2.17) 


3.  The  fast  multipole  algorithm 

In  this  section,  we  present  an  algorithm  for  the  rapid  evaluation  of  the  potentials  and/or 
electrostatic  fields  due  to  distributions  of  charges.  The  central  strategy  used  is  that  of  clustering 
particles  at  various  spatial  lengths  and  computing  interactions  with  other  clusters  which  are  suf¬ 
ficiently  far  away  by  means  of  multipole  expansions.  Interactions  with  particles  which  are  nearby 
are  handled  directly. 

To  be  more  specific,  let  us  consider  the  geometry  of  the  computational  box,  depicted  in  Figure 
3.  It  is  a  square  with  sides  of  length  one,  centered  about  the  origin  of  the  coordinate  system,  and  is 
assumed  to  contain  all  N  particles  of  the  system  under  consideration.  The  eight  nearest  neighbor 
boxes  are  also  shown,  and  will  be  needed  in  the  next  section  when  considering  various  boundary 
conditions.  First,  we  will  describe  the  method  for  free-space  problems ,  where  the  boundary  can  be 
ignored,  and  the  only  interactions  to  be  accounted  for  involve  particles  within  the  computational 
box  itself. 

Fixing  a  precision  e,  we  choose  p  ss  log2{t)  and  specify  that  no  interactions  be  computed  for 
clusters  of  particles  which  are  not  well-separated.  This  is  precisely  the  condition  needed  for  the 
error  bounds  (2. 4). (2. 11)  and  (2.15)  to  apoly  with  c  —  2,  the  truncation  error  to  be  bounded  by 
2~p.  and  the  desired  precision  to  be  achieved.  In  order  to  impose  such  a  condition,  we  introduce 
a  hierarchy  of  meshes  which  refine  the  computational  box  into  smaller  and  smaller  regions  (Figure 
4).  Mesh  level  0  is  equivalent  to  the  entire  box,  while  mesh  level  /  4-  1  is  obtained  from  level  l  by 
subdivision  of  each  region  into  four  equal  parts.  The  number  of  distinct  boxes  at  mesh  level  l  is 
equal  to  4*.  A  tree  structure  is  imposed  on  this  mesh  hierarchy,  so  that  if  ! box  is  a  fixed  box  at 
level  l.  the  four  boxes  at  level  l  +  1  obtained  by  subdivision  of  ibox  are  considered  its  children. 

Other  notation  used  in  the  description  of  the  algorithm  includes 

<I>, ,  the  p-term  multipole  expansion  (about  the  box  center)  of  the  potential  field 

created  by  the  particles  contained  inside  box  i  at  level  /. 

'Fj ,  the  p-term  local  expansion  about  the  center  of  box  i  at  level  l.  describing  the 

potential  field  due  to  all  particles  outside  the  box  and  its  nearest  neighbors. 
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the  p-term  local  expansion  about  the  center  of  box  i  at  level  l ,  describing 
the  potential  field  due  to  all  particles  outside  t’s  parent  box  and  the  parent 
box’s  nearest  neighbors. 

Interaction  list:  for  box  i  at  level  l,  it  is  the  set  of  boxes  which  are  children  of  the  nearest 
neighbors  of  i’s  parent  and  which  are  well-separated  from  box  i  (Figure  5). 

Suppose  now  that  at  level  l  —  1,  the  local  expansion  'I'f-i,,-  has  somehow  been  obtained  for  all 
boxes.  Then,  by  using  lemma  2.5  to  shift  (for  all  t)  the  expansion  to  each  of  box  i' s  children  , 

we  have,  for  each  box  j  at  level  l,  a  local  representation  of  the  potential  due  to  all  particles  outside 
of  j's  parent’s  neighbors,  namely  The  interaction  list  is,  therefore,  precisely  that  set  of  boxes 
whose  contribution  to  the  potential  must  be  added  to  ’i'j j  in  order  to  create  'F ij .  This  is  done  by 
using  lemma  2.4  to  convert  the  multipole  expansions  of  these  interaction  boxes  to  local  expansions 
about  the  current  box  center  and  adding  them  to  the  expansion  obtained  from  the  parent.  Note 
also  that  with  free-space  boundary  conditions,  'Fo.i  and  are  equal  to  zero  since  there  are  no 
well-separated  boxes  to  consider,  and  we  can  begin  forming  local  expansions  at  level  2. 

Following  is  a  formal  description  of  the  algorithm. 


Initialization 


Algorithm 


Choose  a  level  of  refinement  n  a?  log^N,  a  precision  €,  and  set  p  %  log^e). 

Upward  Pass 

Step  1 

Comment  [  Form  multipole  expansions  of  potential  field  due  to  particles 
in  each  box  about  the  box  center  at  the  finest  mesh  level.] 


do  ibox  —  1, ....  4" 

Form  a  p-term  multipole  expansion  $n,i6on  by  using  Theorem  2.1. 
enddo 

Step  2 

Comment  [  Form  multipole  expansions  about  the  centers  of  all  boxes 

at  all  coarser  mesh  levels,  each  expansion  representing  the  potential 
field  due  to  all  particles  contained  in  one  box.  ] 

do  l  =  n  —  1 . 0 

do  i box  =  1 . 4; 

Form  a  p-term  multipole  expansion  by  using 

lemma  2.3  to  shift  the  center  of  »ach  child  box's  expansion 
to  the  current  box  center  and  adding  them  together. 

enddo 

enddo 


Downward  Pass 


Comment  [  In  the  downward  pass,  interactions  are  consistently  computed 
at  the  coarsest  possible  level.  For  a  given  box,  this  is  accomplished 
by  including  interactions  with  those  boxes  which  are  well-separated 
and  whose  interactions  have  not  been  accounted  for  at  the  parent’s 
level.  ] 

Step  3 

Comment  [  Form  a  local  expansion  about  the  center  of  each  box  at  each  mesh  level  l  <  n  — 
This  local  expansion  describes  the  field  due  to  all  particles  in  the 
system  that  are  not  contained  in  the  current  box  or  its  nearest  neighbors.  Once 
the  local  expansion  is  obtained  for  a  given  box,  it  is  shifted,  in  the  second 
inner  loop  to  the  centers  of  the  box’s  children,  forming  the  initial  expansion 
for  the  boxes  at  the  next  level.  ] 

Set  =  ^1,2  =  $1.3  = 
do  /  =  1, ...,  n  —  1 
do  ibox  =  1, ...,  4 1 

Form  Vi'i box  by  using  lemma  2.4  to  convert  the  multipole  expansion 
<&tj  of  each  box  j  in  interaction  list  of  box  ibox 
to  a  local  expansion  about  the  center  of  box  ibox,  adding  these 
local  expansions  together,  and  adding  the  result  to  ^i,,box- 
enddo 

do  ibox  =  1. ...,  4l 

Form  the  expansion  'J'j+i.j  for  ibox's  children 

by  using  lemma  2.5  to  expand  <S/t,n>oz  about  the  children’s  box  centers. 

enddo 

enddo 


Step  4 

Comment  [  Compute  interactions  at  finest  mesh  level  ] 
do  ibox  =  1, ...,  4" 

Form  '&n,ibox  by  using  lemma  3  to  convert  the  multipole  expansion  of 
each  box  in  interaction  list  to  a  local  expansion  about  the  center  of  box  ibox, 
adding  these  local  expansions  together,  and  adding  the  result  to  the 
local  expansion  obtained  from  t&ox’s  parent. 

enddo 


Comment  [  Local  expansions  at  finest  mesh  level  are  now  available. 

They  can  be  used  to  generate  the  potential  or  force  due  to  all 
particles  outside  the  nearest  neighbor  boxes  at  finest  mesh  level.  ] 

Step  5 

Comment  [  Evaluate  local  expansions  at  particle  positions.  ] 
do  ibox  =  1 . 4n 

For  every  particle  pj  located  at  the  point  in  box  ibox, 
evaluate  <P„,,b0x(zj)- 

enddo 


0 


Step  6 

Comment  [  Compute  potential  (or  force)  due  to  nearest  neighbors  directly.  ] 
do  ibox  =  1, 4" 

For  every  particle  py  in  box  ibox,  compute  interactions  with 
all  other  particles  within  the  box  and  its  nearest  neighbors. 

enddo 

Step  T 

do  ibox  =  1, 4" 

For  every  particle  in  box  ibox,  add  direct  and  far-field  terms  together. 

enddo 


Remark:  Each  local  expansion  is  described  by  the  coefficients  of  a  p-term  polynomial.  Direct 
evaluation  of  this  polynomial  at  a  point  yields  the  potential.  But,  by  lemma  2.1,  the  force  is 
immediately  obtained  from  the  derivative  which  is  available  analytically.  There  is  no  need  for 
numerical  differentiation.  Furthermore,  due  to  the  anlyticity  of  there  exist  error  bounds  for  the 
force  of  exactly  the  same  form  as  (2. 4), (2. 11)  and  (2.15). 


A  brief  analysis  of  the  algorithmic  complexity  is  given  below. 


Step  Number 

Operation  Count 

Explanation 

Step  1 

order  .Vp 

each  particle  contributes  to 
one  expansion  at  the  finest 
level. 

Step  2 

order  Np2 

At  the  Ith  level.  4( 
shifts  involving  order  p2 
work  per  shift  must  be 
performed. 

Step  3 

order  <  23Np2 

There  are  at  most  27  entries 
in  the  interaction  list  for 
each  box  at  each  level. 

An  extra  order  .Vp2  work 
is  required  for  the  second  loop 

Step  4 

order  <  27  .Vp2 

Again,  there  are  at  most  27 
entries  in  the  interaction 
list  for  each  box  and 
.V  boxes. 

Step  5 

order  .Vp 

One  p-term  expansion  is 
evaluated  for  each  particle. 

Step  6 

order  2Nk„ 

Let  kn  be  a  bound  on  the 
number  of  particles  per  box 

LO 


at  the  finest  mesh  level. 
Interactions  must  be 
computed  within  the  box 
and  its  eight  nearest  neighbors, 
but  using  Newton’s  third  law, 
we  need  only  compute  half 
of  the  pairwise  interactions. 

Step  7  order  N  Adding  two  terms  for 

each  particle. 


The  estimate  for  the  running  time  is  therefore 

N  (-2 a  logi{()  +  56 b  ( logi[e ))2  +  4.5  d  k„  +  e) 

with  the  constants  a,b,c,d,  and  e  determined  by  the  computer  system,  language,  implementation, 
etc. 

Remark:  It  is  clear  that  the  operation  count  for  Step  6  assumes  a  reasonably  homogeneous  distri¬ 
bution  of  particles.  If  the  distribution  were  highly  non-homogeneous,  then  we  would  need  to  refine 
only  those  portions  of  space  where  the  number  of  particles  is  large.  Although  its  description  is 
more  involved,  an  adaptive  version  retains  both  the  accuracy  and  the  computational  speed  of  the 
algorithm. 

4.  Boundary  Conditions 

A  variety  of  more  complicated  boundary  conditions  are  used  in  particle  simulations,  such 
as  periodic  boundary  conditions,  homogeneous  Dirichlet  conditions,  and  homogeneous  Neumann 
conditions.  Here,  only  the  periodic  case  will  be  treated  in  detail,  the  treatment  of  the  other  two 
cases  being  quite  similar. 

We  start  by  reconsidering  the  computational  domain  depicted  in  Figure  3.  At  the  end  of  the 
upward  pass  of  the  algorithm,  we  have  a  net  multipole  expansion 

$o.i  (4J) 

jt=i 5 

for  the  entire  computational  box.  This  is  then  the  expansion  for  each  of  the  periodic  images  of 
the  box  with  respect  to  its  own  center.  All  of  these  images  except  for  the  ones  depicted  in  Figure 
3  are  well- separated  from  the  computational  box  itself,  and  their  induced  fields  are  accurately 
representable  by  a  p-term  local  expansion,  where  as  before,  p  -log2{e)  is  the  number  of  terms 
needed  to  achieve  a  relative  precision  e.  We  assume  that  the  periodic  particle  model  has  no  net 
charge  and,  therefore,  that  the  local  representation  given  by  lemma  2.4  can  be  written  as 

$0.!  =^2^  Zm  (4.2) 

m=  1 


n 


with  zo  the  center  of  the  image  box  under  consideration. 

Remark:  In  certain  problems  (e.g.  cosmology),  the  computational  box  obviously  cannot  satisfy  the 
condition  of  no  net  charge  (mass).  This  condition  is  necessary  for  the  potential  to  be  well-defined, 
since  the  logarithmic  term  becomes  unbounded  as  zq  — +  oo.  Force  calculations,  however,  may  still 
be  carried  out.  Indeed,  using  the  notation  of  the  algorithm,  V t,i  are  expansions  of  analytic 

functions  representing  the  potential,  so  that  their  derivatives  are  also  analytic  functions  (with  the 
same  regions  of  analyticity).  Moreover,  it  is  clear  from  Theorem  2.1  that  the  derivatives  ${  .  are 
described  by  pure  inverse  power  series.  Therefore,  the  identical  formal  structure  of  the  algorithm 
can,  due  to  lemma  2.1,  be  used  to  evaluate  force  fields  everywhere,  bypassing  the  difficulty  intro¬ 
duced  by  the  logarithmic  term.  The  only  change  required  is  that  the  initial  expansions  computed 
be  the  derivatives  of  the  multipole  expansions  and  not  the  multipole  expansions  themselves. 

Note  now  that  well-separated  images  of  the  computational  cell  are  boxes  whose  centers  zo  have 
integer  real  and  imaginary  parts,  with  Re(zo)  >  2  or  Im(zo)  >  2.  Let  5  be  the  set  of  such  centers. 
To  account  for  the  field  due  to  all  well-separated  images,  we  form  the  coefficients  for  the  local 
representation  by  adding  the  local  shifted  expansions  of  the  form  (4.3)  for  all  zo  6  5  to  obtain 

The  summation  over  S  for  each  inverse  power  of  zo  can  be  precomputed  and  stored.  For 
(m  +  k)  >  2,  the  series  is  absolutely  convergent.  However,  for  (m  +  k)  <  2,  the  series  is  not 
absolutely  convergent,  and  the  computed  value  depends  on  the  order  of  addition.  Choosing  a 
reasonable  value  for  the  sum  of  the  series  requires  careful  consideration  of  the  physical  model. 

Suppose  first  that  the  only  particle  in  the  simulation  is  a  charge  of  unit  strength  located  at  the 
origin.  Then  the  periodic  model  corresponds  to  a  uniform  lattice  of  charges,  and  Newton's  third 
law  requires  that  the  net  force  on  each  particle  be  zero.  But  the  net  force  on  the  particle  at  the 
origin  corresponds  to  the  summation  over  5  of  l/^o,  so  that  we  set 


To  determine  a  value  for  the  second  term. 
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suppose  that  the  only  particle  in  the  simulation  is  a  dipole  of  strength  one,  oriented  along  the  x-axis 
and  located  at  the  origin.  Then  the  periodic  model  is  again  a  uniform  lattice  and  the  difference  in 
potential  between  the  equivalent  sites  (—  5,0)  and  (^.0)  must  be  zero;  i.e. 


f, 


(+4.o) 

-4.0) 


F  ■  ds  =  =  0. 


(4.5) 


The  contribution  to  the  potential  difference,  J  F  ■  ds,  of  a  single  dipole  located  at  zo  is 


^  ^  S*  w  *  y  *-*  *-■  *>« 


Thus,  we  find  that  the  potential  difference  due  to  the  original  dipole  located  at  the  origin  is  —4. 
For  an  image  dipole  located  at  ^Oi  with  |zo|  >  1*  we  can  expand  the  contribution  to  /  F  ■  ds  as 
follows: 


r(+h°)  i 
/(-*,  0)  (2“2o)2 


dz  = 


~i  1+(jg)  +  (ii)  +(i^) +- 


~  ,2  +  ,2 

z0  20 


1-  A 


=  -o  + 


4,4  -  ?2 
420  20 


Now  let  S'  be  the  set  of  the  centers  of  all  image  boxes.  That  is.  S'  is  the  set  of  all  points  zo  with 
integer  real  and  imaginary  parts,  excluding  the  origin.  Then 

1  1 


r+i'  ’  r*  ,  v-  1  v-  1 

/.  ,  n.  F  ds  =  ~4  +  E?  +  E  jp  _  22  ■ 

•/  { —  5 ,0)  ci  z0  Cl  *Z0  Z0 


A  somewhat  involved  calculation  shows  that 


E^^nr^4"*  • 
s#  420  *0 


Therefore,  to  satisfy  (4.5),  we  set 


S'  0  S  0  S'\S  ~° 

and  the  sum  J2s'\s  A  *s  easdy  evaluated  and  found  to  be  equal  to  zero.  Therefore,  we  have 


~  *C\ 


and  the  summation  over  5  for  every  inverse  power  of  zo  is  defined. 

The  procedure  of  converting  the  multipole  expansion  of  the  whole  computational  cell  $0,1  into 
a  local  expansion  ^o.i  which  describes  the  potential  field  due  to  all  well-separated  images  can  be 
written,  in  the  notation  of  the  algorithm,  as 

^0,1  =  T  'I’o.i  < 

where  T  is  a  constant  p  by  p  matrix  whose  entries  are  defined  by  the  formula 


T,n.k  = 


„m  +  ifc 
5  -0 
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This  can  be  viewed  as  the  first  step  in  the  downward  pass  of  the  algorithm  for  periodic  boundary 
conditions.  At  this  point,  we  have  accounted  for  all  interactions  excluding  the  ones  within  the 
immediate  neighbors  of  the  computational  box  as  depicted  in  Figure  3.  But  the  expansions 
for  boxes  inside  the  computational  cell  are  also  the  expansions  of  the  corresponding  boxes  inside 
the  nearest  neighbor  images  of  the  computational  cell.  By  adding  to  the  interaction  list  the  appro¬ 
priate  boxes,  we  maintain  the  formal  structure  of  the  algorithm  and  the  associated  computational 
complexity. 

5.  Numerical  Results 

A  computer  program  has  been  implemented  utilizing  the  algorithm  of  the  present  paper  and  ca¬ 
pable  of  handling  both  free-space  and  periodic  boundary  conditions.  The  ability  to  handle  Dirichlet 
and  Neumann  boundary  conditions  will  be  added  in  the  near  future. 

For  testing  purposes,  we  randomly  assigned  charged  particles  to  positions  in  the  computational 
cell  (Figure  7),  with  charge  strengths  between  0  and  1,  and  with  the  numbers  of  particles  varying 
from  100  to  12800.  The  calculations  were  performed  on  a  VAX-8600,  and  the  number  of  terms  in 
the  expansions  <$(,,  was  set  to  20,  guaranteeing  roughly  5-digit  accuracy  of  the  result. 

In  each  case,  we  performed  the  calculation  in  three  ways:  via  the  algorithm  of  the  present  paper 
in  single  precision  arithmetic,  directly  (via  formula  (2.7))  in  single  precision  arithmetic,  and  via 
formula  (2.7)  in  double  precision  arithmetic.  The  first  two  calculations  were  used  to  compare  the 
speed  and  accuracy  of  our  algorithm  to  those  of  the  direct  method.  The  direct  evaluation  of  the 
field  in  double  precision  was  used  as  a  standard  for  comparing  the  relative  accuracies  of  the  first 
two  computations.  In  all  cases,  the  calculation  was  performed  for  a  periodic  model,  the  periodic 
boundary  condition  being  imposed  by  means  of  the  algorithm  described  in  Section  4  of  the  present 
paper. 

The  results  of  these  numerical  experiments  are  summarized  in  Table  I.  Its  first  column  contains 
the  numbers  N  of  particles  for  which  calculations  have  been  performed.  The  second  and  third 
columns  of  Table  1  contain  the  CPU  times  Tatg  that  were  required  by  the  algorithm  of  the  present 
paper  to  obtain  the  fields  at  all  N  particles,  and  the  greatest  relative  error  <5a/s  obtained  at  any 
of  the  particles,  respectively.  Columns  4  and  5  contain  the  CPU  times  Tj, >  that  were  required  by 
the  direct  algorithm  (2.7)  to  obtain  the  fields  at  all  iV  particles,  and  the  greatest  relative  error  <5*> 
obtained  at  any  one  particle,  respectively. 

Remark:  For  the  example  involving  12800  particles,  the  algorithm  of  the  present  paper  required 
about  one  minute  of  CPU  time  (see  Table  1).  However,  it  was  not  considered  practical  to  use 
the  direct  algorithm  to  evaluate  the  field  at  all  12800  points,  since  it  would  take  about  5  hours  of 
CPU  time,  without  producing  much  useful  information.  Therefore,  we  used  the  direct  algorithm  to 
evaluate  the  field  at  only  100  of  12800  particles,  both  in  single  and  double  precision,  and  used  the 
resulting  data  to  estimate  5aig  and  <5<f,r .  The  value  for  r*>  in  this  case  was  estimated  by  scaling. 
The  following  observations  can  be  made  from  Table  1. 

1.  The  accuracy  of  the  results  produced  by  the  algorithm  is  about  the  same  as  that  predicted  by 
the  estimates  (2. 4), (2. 11)  and  (2.15)  for  the  number  of  terms  we  are  using  in  the  expansions 

'lr/ ,,  'P(  l.  There  is  no  evidence  of  accuracy  problems  due  to  truncation  errors. 

2.  The  calculation  time  grows  linearly  with  the  number  of  charges  in  the  model,  even  though  its 
behaviour  is  somewhat  erratic. 

3.  For  as  few  as  1600  particles  in  the  model,  the  computational  effort  required  by  the  direct 
algorithm  is  roughly  40  times  greater  than  that  required  by  the  algorithm  of  the  present  paper. 
For  12,300  particles,  the  effort  is  nearly  300  times  greater. 
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6.  Conclusions 

An  algorithm  has  been  constructed  for  the  rapid  evaluation  of  potential  fields  generated  by  en¬ 
sembles  of  particles  encountered  in  plasma  physics,  molecular  dynamics,  fluid  dynamics  (the  vortex 
method),  and  celestial  mechanics.  The  algorithm  is  applicable  both  in  the  context  of  dynami¬ 
cal  simulations  and  Monte  Carlo  simulations,  provided  that  the  fields  to  be  evaluated  are  either 
Coulombic  in  nature  (for  example,  in  plasma  physics,  molecular  dynamics,  and  celestial  mechan¬ 
ics)  or  can  be  approximated  by  Coulombic  fields  (as,  for  example,  in  the  vortex  method  for  fluid 
dynamics  simulations  (1]).  The  asymptotic  CPU  time  estimate  for  the  algorithm  of  the  present 
paper  is  of  the  order  0(N ),  where  N  is  the  number  of  particles  in  the  simulation,  and  the  numerical 
examples  presented  in  Section  5  indicate  that  even  very  large-scale  problems  result  in  acceptable 
CPU  time  requirements.  In  the  present  paper,  a  two-dimensional  version  of  the  algorithm  is  de¬ 
scribed.  Generalizing  this  result  to  three  dimensions  is  fairly  straightforward,  and  will  be  reported 
at  a  later  date. 

It  is  the  authors'  pleasure  to  thank  Professor  M.H.  Schultz  for  drawing  their  attention  to  the 
subject  of  this  paper,  and  for  his  continuing  interest  and  support. 
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Figure  3:  The  computational  box  (shaded)  and  its  nearest 
periodic  images.  The  box  is  centered  at  the  origin  "0”  and 
has  area  one. 


